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Abstract 

We investigate the asymptotic behavior of posterior distributions of regression coefficients in Irigh- 
dimensional linear models as the number of dimensions grows with the number of observations. Assuming 
a sparse true model, we give sufHcient conditions on strong posterior consistency and provide examples 
with popular shrinkage priors. 
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1 Introduction 

Consider the linear model y„ = -^n/?n. + ^n, where y„ is an n-dimensional vector of responses, Xn is the 
n X Pn design matrix, e„ ~ N (0,cj^/„) with known u^, and some of the components of (3^ are zero. Let 
•^n = {j ■ Pnj / 0)i = • • • ,Pn} and \An\ = Qn denote the set of indices and number of nonzero elements 
in/30. 

To justify Bayesian high-dimensional regression, it is important to establish posterior consistency when 
Pn — )• oo as n — )• oo. Our main contribution is providing a simple sufficient condition on the prior concentra- 
tion for strong posterior consistency when p„ = o{n). Our particular focus is on s hrinkage priors, including 
the Laplace, Student t, generalized double Pareto, and horseshoe-type priors (j Johnstone &: Silverman . 



200i; lOriffin fc Brownl . 120071 : ICarvalho etHI . boid : lArmagan et al.l . l2011al ). There is a rich methodological 



an d applied litera ture supporting s uch p riors but a lack of theoretical results. 

Ghosal ( I999I ) and Bontempsl ( 2011 ) provide results on asymptotic normal ity of t h e pos terior of (3n 
in linear models for pf^logpn = o{n) and p„ < n, respectively. As a corollary, Ghosal ( 19991 ) also states 
post erior consisten cy in linear models wh en p^ logn/n — )■ under the usual assumptions on X„. However, 



both lGhosall (Il999l l and lBontem^ toil\ ) require Lipschitz conditions ensuring that the prior is sufficiently 
flat in a neighborhood of the true /3^. Such c onditions are restrictive when using shrinkage priors that are 
designed to concentrate on sparse /3„ vectors. Ijiang) (120071 ) instead studies convergence rates in estimating 
the predictive distribution of y„ given X„ obtained from Bayesian variable selection in p„, ^> n settings 
for generalized linear models, but does not consider the posterior of /3n- To our knowledge, there are no 
asymptotic results supporting the use of shrinkage priors for /3„ in high-dimensional settings. 
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2 Sufficient Conditions for Posterior Consistency 

Our results on posterior consistency rely on the following assumptions as n — )• cx). 
(Al) pn = o{n). 

(A2) Let A^min and A.„max be the smallest and the largest singular values of X„, respectively. Then 

< Amin < liminf„^oo Anmin/Vn < lim SUp„_^o^ A„ max/V""- < Amax < OO. 

(A3) supj^^^ \/3^j\ < OO. 
(A4) Qn = o{n^-p/y{^Pnlogn)} for p G (0,2). 
(A5) Qn = o(n/ log n). 
Assumptions (A4) and (A5) will be used in different settings. 

Lemma 1. Let Bn ■= {/3n : ||/3n — > e} where e > 0. To test Hq : f3n = Pn Hi : f3n ^ Bn, we 
define a test function ^niUn) = liUn S C„) where the critical region is C„ := {yn '■ \\/3n — f^nW > ^/^j and 
= {Xn^n)~^Xnyn- Then, Under assumptions (Al) and (A2), as n ^ oo, 

2. sup^„ge„ E^^il - $„) < exp{-eW^J{Wa^)}. 

Theorem 2.1. Under assumptions (Al) and (A2), the posterior of /3n under prior n„(/3„) is strongly 
consistent, that is, for any e > 0, Iln{Bn\yn) = n„(/3„ : ||/3„ — /3j||| > e|y„) — )■ prf^o-almost surely as 
n — )• OO, if 

(pn : Wn " PnW < ^1 > exp(-(in) 



for allO<A< e'^A'^.J{48Al^^J and < d < e^Al^.J{32a'^) - 3AA^^^/(2cr2) <,^^g 

Theorem 1 provides a simple sufficient condition on the concentration of the prior around sparse /3^. We 
use Theorem 1 to provide conditions on under which specific shrinkage priors achieve strong posterior 
consistency, focusing on priors that assume independent and identically distributed elements of /^^i. 

Theorem 2.2. Under assumptions (Al)-(A4), the Laplace prior f{(3nj\sn) = (l/2s„) exp(— |/3„j|/s„,) with 
scale parameter s„, yields a strongly consistent posterior if Sn = C/(y^p„n''/^ log n) for finite C > 0. 

Student t prior: The density function for the scaled Student t distribution is 

= V4B(l/2,do/2) + ^ j 
with scale s, degrees of freedom do, and B(-) denoting the beta function. 

Theorem 2.3. Under assumptions (Al)-(A3) and (A5), the scaled Student t prior with parameters s^ 
and don yields a strongly consistent posterior if don = do E (2,oo) and Sn = C/(-y/p„n''/^ log n) for finite 
p> and C > 0. 

Generalized double Pareto prior: As defined by lArmagan et al.l (l2nilbl ). the generalized double Pareto 
density is given by 

mK.) = ^(i+^)"'°"'. (1) 

where a, i] > 0. 
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Theorem 2.4. Under assumptions (Al)-(A3) and (A5), the generalized double Pareto prior with param- 
eters an and rjn yields a strongly consistent posterior if = a €i (2,oo) and rjn = C / {^Jpnuf^'^logn) for 
finite p > and C > 0. 



Horsehoe-like priors: As defined in Armagan et al. ( 2011al ). generalized beta scale mixtures of normals 
are obtained by the following three equivalent representations: 



N(0, Tj),Tj~ 

N(0,T,),/(r,) 



r(ao)r(6o)^ 

N{0,Tj),Tj ~ Ga{ao,Xj),Xj ~ Ga(6o,0 



(2) 



r(ao)r(6o) 

where oq, bo, ^ > 0. Due to the representation in (2) and the work by Carvalho et al. ( 2O10l ). we refer to these 
priors as horseshoe-like. The a b ove formulation y i elds a gen eral family that covers special cases discussed in 



priors as tiorsesnoe-liKe. ine a b ove lormuiation y i eias a gen eral ramiiy tnat covers special cases aiscussea m 
Johnstone &: Silverman (|2004l ).l I Griffin fc Brow A (|2007l ) and lCarvalho et al. l (|2O10l l. The resulting marginal 
density on I3j is 



/(/3j|ao,6o,0 



r(6o + l/2)r(ao + 6o)U{6o + 1/2, 3/2 - ao, /S^m} 



(27rOV2r(ao)r(6o) 

where U(-) denotes the confluent hyper geometric function of the second kind. 



(3) 



Theorem 2.5. Under assumptions (A1)-(A3) and (A5), the prior in (0j with parameters a^n = ao S 
(0, oo), 6on = ^0 G (IjOo) and yields a strongly consistent posterior if S,n = C'/(p„n'' log n) for finite 
p> and C > 0. 

3 Final Remarks 

Our analysis is heavily dependent on the construction of good tests. Results can be extended utilizing 
appropriate tests relying on an estimator with asymptotically vanishing probability o f being outside of a 
shrinking neighborhood of the truth. For instanc e, one could use re sults similar to Bickel et al. ( 20091 ) 
given additional conditions on Xn. Theorem 7.2 of lBickel etaP ^200^ ) states that 



pr^o ||/3„i-/3°||i>M 



flnlogPn 



n 



< 



(4) 



for On > 2^/2 and for some M > 0, where /3„l denotes the Lasso estimator. Hence using (jl]), in a 
similar fashion to Lemma 1, we can obtain consistent tests with an e- neighborhood contracting at a rate 
O {{on log pn)^ ^ /\/n}. Assuming qn < oo for simplicity and letting = O(logn), following Theorems 1, 
3, 4 and 5, we anticipate that under the Student t, generalized double Pareto and horseshoe-like priors, a 
near-optimal contraction rate of 0{{lognlogpn)^^'^ / y/n} is possible. 

As in almost all of the Bayesian asymptotic literature, we have focused on sufficient conditions. Our 
conditions are practically appealing in allowing priors to be screened for their usefulness in high-dimensional 
settings. However, it would be of substantial interest to additionally provide theory allowing one to rule 
out the use of certain classes of priors in particular settings. 



4 Technical Details 

Proof of Lemma 1. Noting that /3„ = {X:^Xn)-^X^yn, Epo{^n) = pr/3o(||/3„ - /3°|| > e/2) < pr^o{Xp„ > 
e^nA^;^/(4cj^)} where Xp is ^ chi-squared distributed random variable with p degrees of freedom. The 
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inequality is attained using assumption (A2). Similarly, sup^^g^^ £'/3„(l-$„) < 

SUp^^gg^ pr^^ y II Pn Pnll 

||/3°-/3„||| <e/2) <sup^„ge„pr^J||^„-/3„|| > -e/2 + 11/3° - = pr^J||^„ - /3„|| > 6/2) < prgojxL > 
efnAlj;^/(4(T^)}. Simplifying the inequality pr{Xp — p> 2(j>x)^/^ + 2x} < exp(— x) by lLaurent Massart 
(2000|), we state that pr(xp > x) < exp(— 2;/4) if x > 8p. Then, using assumption (Al), as n — >• oo, 

E^oi^n) < exp{-e^nAlJ{Wa^)}, 
sup E^„{l-^n) < exp{-e^nAl-J{l6a^)}. 

This completes the proof. □ 
Proof of Theorem 1. The posterior probability of Bn is given by 

k{/(ynl/3n)//(yn|/3°)}n(d/3„) 



Iln{Bn\yn) 



f{f{yn\Pn)/f{ynmmdPn) 
(1 - ^n) Jb„ 



= h+h/Jn, (5) 

where Jb^ = /g^{/(y„|/3n)//(yri|/3°)}n((i/3„) and J„ = Js^Pn. We need to show that h +/2/J„ 
pr^o-almost surely as n — )• oo. Let b = e^A^;jj/(16(T^). For sufficiently large n, pr^o{/i > exp(— 6n/2)} < 
exp(6n/2)£'^o (Ii) = exp(— 6n/2) using Lemma 1. This implies that X]^iPr^o{A > exp(— 6n/2)} < oo 
and hence by the Borel-Cantelli lemma pr^|^{/i > exp(— 6n/2) infinitely often} = 0. We next look at the 
behavior of I2: 

Epoih) = Epo{{l-^n)JBj 

(1 - <^n)fiyn\l3n)dynlln{d/3n) 



'Bn 

< Un{Bn) sup %,(l-^>„) 

< exp(— fen) 

Then for sufficiently large n, pr^o{/2 > exp(— 6n/2)} < exp(— 6n/2) using Lemma 1. Again Yl'^=i {-^2 > 
exp(— 6n/2)} < 00 and hence by the Borel-Cantelli lemma pTp^{l2 > exp(— 6n/2) infinitely often} = 0. 

We have shown that both Ii and I2 tend towards zero exponentially fast. Now we analyze the behavior 
of Jn- To complete the proof, we need to show that exp(5n/2) J„ — t- 00 pr^o-almost surely as n — ?• 00. 

exp(6n/2)J„ = exp(bn/2) / exp |-n- log ] n„(d/3„) 

> exp{{b/2 - u)n}UniVn,u) (6) 

where = {/3„ : 7i~^ log{/(?/„|/30)//(y„|/3„)} < u} = {/?„ : 7i~W\yn - X„/3„||2 - - X^/jOf) < 2a^u} 
for any < 1/ < 6/2. Then n„(P„,^) > n„{/3„ : n~^\\\yn - XM\^ - ||y„ - Xn/3^f\ < 2a^u}. Using the 
identity — Xq = 2xo(x — xq) + (x — xq)^ for all x, xq G 5?, 

n„(Pn,.) > Un{/3n:n-^\2\\yn-Xnf3^\\{\\yn-XM\-\\yn-Xn(3^\\) 

+ {\\yn-XM\ - \\yn-XnP^jf\<2a^u} 

> Un {Pn : n-\2\\yn - Xn^^\\\\Xn(3n " + ||X„/3„ - X„/30f) < 2a^iy} 

> Un (l3n : n-^\\XnPn - XnPlW < ||X,/3„ - XnPlW < Kn) (7) 
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given that - X„/3°|| < For k„ = n'^'^+p)!'^ with p > and k^/ct^ > 8n, pr^o(?/n : - > 
«^n) = Pr/3o(y„ : > «^n/o"^) < exp{-K2/(4o-2)}. Since E^^Li P^^/30 (l/n : ll^n -XnPlW > Kn) < oo, by the 
Borel-Cantehi lemma pr^o(?/n : WUn — ^nf^nW > infinitely often) = 0. Following from ([7]) and the fact 
that Kn —5- oo, as n — oo, for sufficiently large n, Iln(T>n^u) > n„{/3„ : — X„/3j^|| < la'^u / {2>Kn)} > 

nn(/3n : " < A/ji^/^), where A = 2a'^ v / {'iK^^.l) ■ Hence following ©, n„(S„|y„) ^ pr^o-almost 
surely as n — )• oo if n„(/3n : Wn - /3n|| < A/n''/^) > exp(-(in) for all < d < 6/2 - u. This completes the 
proof. □ 

Proof of Theorem 2. We need to calculate the probability assigned to the region : ||/3n — /^^ll < A/n^/^} 
under the Laplace prior. 



nJ/3„: 11/3,-/3° II < 



n 



A 



A 



> n |nn(/3„i : |/3n,-/3°,| < 



> n nJ/3„,- : |/3ni-/3°,| < 
where -B(/3^ •) can verified to be 2s^. Following from 



n] 



{Pn - gn)A2 



(/3„ : ||/3„-/30|| < 



A 



> 



A 



exp 



A 



lyJPn.nPl'^ 



A2 



(8) 



(9) 



Taking the negative logarithm of both sides of (l9|) and letting s„ = C / {^pn'n'^^'^ logn) for some C > 0, we 
obtain 



logn„ ( ■■ Wn - PnW < ) < -Qn log A + log C - g„ log log n 



logU 



] g^Alogn gn\/pn^^''/^logr^supJg_4^ 1/3°. 



A2(logn)2j C 



+ 



+ 



C 



(10) 



as n — )• oo. It is easy to see that the dominating term in (fTOl) is the last one and — logn„(/3„ : ||/3„ — /3°|| < 
A/nP^'^) < dn for all d > 0. This completes the proof. □ 

Proof of Theorem 3. E{f5'^-), in this case, is given by doSn/{do — 2). For the sake of simplicity, we let 
do = 3. Then following from ([8]) 



n„(/3„: ||/3„-/3°|| < 



n 



A 

"772 



> 1 



SpnnPsl 



2A 



^PnnP/^SnV3B{l/2,3/2) 



1 + 



A2 



3s2 



+ 



2A2 



Ss^PnnP 



Qn 



(11) 
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Taking the negative logarithm of both sides of (jlip and letting s„ = C/(\/p„n^/^ log?i) for some C > 0, 
we obtain 



logn, ( /?„ : ||/3n-p„ 
C72 



log n 



, ^ , 2pnnnognsup^^^Mjy , 2A^(logn)^ 

as n — )• oo. It is easy to see that the dominating term in (fT2]) is the last one and — logn„(/3„ : ||/3„ — < 
< ^„ all d > 0. The result can be easily shown to hold for all do G (2, oo). This completes the 
proof. □ 

Proof of Theorem 4- in this case, can verified to be 2r/^/(a^ — 3a + 2) for a > 2. For the sake of 

simplicity, we let a = 3. Then following from ([8]) 



nn(/3n: ||/3n-/3°|| < 
3A 



A 



> 



1 + 



I 
nj\ 



+ 



A 



(13) 



Taking the negative logarithm of both sides of ([T3|) and letting = C/(-y/p„?i^/^ logn) for some C > 0, 
we obtain 



log n„ /3„ : 



/3^ll< 



A 



< -g„ log 3 A - 3g„ log C - qn log log n 



log n 



C72 



A2(log 



+ 4g„ log ( C + A log n + y/pnUp/'^ log n sup 1/3°^ , 



(14) 



as n — )• oo. It is easy to see that the dominating term in is the last one and — \ogIin{Pn '■ \\Pn — PnW < 
A/n^/2) ^ fo^ a,ll d > 0. The result can be easily shown to hold for all a G (2, oo). This completes the 
proof. □ 

Proof of Theorem 5. Similarly to the previous cases, we can show that E{(3lj) = CnT{ao + l)r(6o - 
l)/{r(ao)r(6o)}. Then following from ([8]) 



/3nll < 



A 



PnnPE{l3l^) 



2A 



nP/"^ ) ~ 1 A2 j Xy/pnuPl"^ 

U{bo + 1/2, 3/2 - gp, sup^g^J/j°^.)Vg„. + A/(p„.n^en)} 
(27re„)i/2r(ao)r(6o)r(6o + l/2)-ir(ao + 6o)-i 



(15) 



We can use the expansion U(a, h^z) = z {^m=oi^)-m{^ + a — h)m{—z)"^/m\ + 0{\z \ ^)) for large z, where 
{a]m = a{a + 1) . . . (a + m — 1) and i?th term is the smallest in the expansion (jAbramowitz Sz Stegun . 
193). Letting i? = 1, for sufficiently large n, (fTSj) can be further bounded as 



/3°ll< 



A 



><1 



PnnPE{f3lj, 



nP/^ } ^ \ ^ A2 

V2Ar(6o + l/2)r(oo + bo) 



VPnn^/V?nAr(ao)r(6o){sup,g^J/30^.)Ven + A/(p„nPen)}(''°+i/2) 



(16) 
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Taking the negative logarithm of both sides of ()16p and letting ^„ = C/{pnn^logn) for some C > 0, we 
obtain 

-logn„ (^/3„: ||/3„-/30|| <-A_^ < 

[ V2Ar(bo + l/2)r(ao + bo) ) [ CT(ao + l)r(6o - 1) 

VCV^r(ao)r(6o) J °n lognAr(ao)r(&o) 

__ log log 71 + qn ( ^0 + 2 1 log ^ ^ — + } (17) 

as n — >• oo. It is easy to see that the dominating term in p!7|) is the last one and — logn„(/3„ : — < 
A/n''/^) < dn for all (i > 0. This completes the proof. □ 
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